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Abstract 

Inspired by the context of compressing encrypted sources, this paper considers the general tradeoff between rate, 
end-to-end delay, and probability of error for lossless source coding with side-information. The notion of end-to-end 
delay is made precise by considering a sequential setting in which source symbols are revealed in real time and 
need to be reconstructed at the decoder within a certain fixed latency requirement. Upper bounds are derived on 
the reliability functions with delay when side-information is known only to the decoder as well as when it is also 
known at the encoder. 

When the encoder is not ignorant of the side-information (including the trivial case when there is no side- 
information), it is possible to have substantially better tradeoffs between delay and probability of error at all rates. 
This shows that there is a fundamental price of ignorance in terms of end-to-end delay when the encoder is not 
aware of the side information. This effect is not visible if only fixed-block-length codes are considered. In this way, 
side-information in source-coding plays a role analogous to that of feedback in channel coding. 

While the theorems in this paper are asymptotic in terms of long delays and low probabilities of error, an 
example is used to show that the qualitative effects described here are significant even at short and moderate delays. 
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The price of ignorance: 
The impact of side-information on delay for lossless 

source-coding 

I. Introduction 

There are two surprising classical results pertaining to encoder "ignorance:" Shannon's finding in [1] that the 
capacity of a memoryless channel is unchanged if the encoder has access to feedback and the Slepian-Wolf result in 
[2] that side-information at the encoder does not reduce the data-rate required for lossless compression. When the rate 
is not at the fundamental limit (capacity or conditional entropy), the error probability converges to zero exponentially 
in the allowed system delay — with block-length serving as the traditional proxy for delay in information theoretic 
studies. Dobrushin in [3] and Berlekamp in [4] followed up on Shannon's result to show that feedback also does 
not improveQ the block-coding error exponent in the high-rate regime (close to capacity) for symmetric channels. 
Similarly, Gallager in [6] and Csiszar and Korner in [7] showed that the block-coding error exponents for lossless 
source-coding also do not improve with encoder-side-information in the low rate regime (close to the conditional 
entropy). These results seemed to confirm the overall message that the advantages of encoder knowledge are 
limited to possible encoder/decoder implementation complexity reductions, not to anything more basic like rate or 
probability of error. 

Once low complexity channel codes were developed that did not need feedback, mathematical and operational 
duality (See e.g. [8], [9]) enabled corresponding advances in low complexity distributed source codes. These codes 
then enabled radical new architectures for media coding in which the complexity could be shifted from the encoder 
to the decoder [10], [11]. Even more provocatively, [12] introduced a new architecture for information-theoretic 
secure communication illustrated as a shift from Figure Q] to Figure [2] By viewing Shannon's one-time-pad from 
[13] as virtual side information, Johnson, et al in [12] showed that despite being marginally white and uniform, 
encrypted data could be compressed just as effectively by a system that does not have access to the key, as long as 
decoding takes place jointly with decryption. However, all of this work followed the traditional fixed-block-length 
perspective on source and channel coding. 
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Fig. 1. The traditional compression/encryption system for sources with redundancy. (Figure adapted from [12]) 



Recently, it has become clear that the behavior of fixed-block-length codes and fixed-delay codes can be quite 
different in contexts where the message to be communicated is revealed to the encoder gradually as time progresses 
rather than being known all at once. In our entire discussion, the assumption is that information arises as a stream 
generated in real time at the source (e.g. voice, video, or sensor measurements) and it is useful to the destination in 



'The history of feedback and its impact on channel reliability is reviewed in detail in [5]. 
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Fig. 2. The novel compression/encryption system in which a message is first encrypted and then compressed by the "ignorant" encoder. 
(Figure adapted from [12]) 



finely grained increments (e.g. a few milliseconds of voice, a single video frame, etc.). The encoded bitstream is also 
assumed to be transported at a steady rate. The acceptable end-to-end delay is determined by the application and can 
often be much larger than the natural granularity of the information being communicated (e.g. voice may tolerate 
a delay of hundreds of milliseconds despite being useful in increments of a few milliseconds). The end-to-end 
delay perspective here is common in the networking community. This is different from cases in which information 
arises in large bursts with each burst needing to be received by the destination before the next burst even becomes 
available at the source. 

[5] shows that unlike the block channel coding reliability functions, the reliability function with respect to 
fixed end-to-end delay can in fact improve dramatically with feedback for essentially all DMCs at high ratesH 
The asymptotic factor reduction in end-to-end delay enabled by feedback approaches infinity as the message rate 
approaches capacity for generic DMCs. In addition, the nature of the dominant error events changes. Consider 
time relative to when a message symbol enters the encoder. Without feedback, errors are usually caused by future 
channel atypicality. When feedback is present, it is a combination of past and future atypicality that causes errors. 

The results in [5] give a precise interpretation to the channel-coding half of Shannon's intriguingly prophetic 
comment at the close of [16]: 

"[the duality of source and channel coding] can be pursued further and is related to a duality between 
past and future and the notions of control and knowledge. Thus we may have knowledge of the past and 
cannot control it; we may control the future but have no knowledge of it." 

One of the side benefits of this paper is to make Shannon's comment similarly precise on the source coding 
side. Rather than worrying about what the appropriate granularity of information should be, the formal problem is 
specified at the individual source symbol level. If a symbol is not delivered correctly by its deadline, it is considered 
to be erroneous. The upper and lower bounds of this paper turn out to not depend on the choice of information 
granularity, only on the fact that the granularity is much finer than the tolerable end-to-end delay. 

Here, we show that when decoder side-information is also available at the encoder, the dominant error event 
involves only the past atypicality of the source. This gives an upper bound on the fixed-delay error exponent that 
is the lossless source-coding counterpart to the "uncertainty focusing bound" given in [5] for channel coding with 
feedback. This bound is also shown to be asymptotically achievable at all rates. When side-information is present 
only at the decoder, [17] showed that the much slower random-coding error exponent is attainable with end-to-end 
delay. Here, an upper bound is given on the error exponent that matches the random-coding bound from [17] at 
low rates for appropriately symmetric cases — like the case of compressing encrypted data from [12]. This shows 
that there is a fundamental price of encoder ignorance that must be paid in terms of required end-to-end delay. 

Section [TT] fixes notation, gives the problem setup, and states the main results of this paper after reviewing 
the relevant classical results. Section HIT] evaluates a specific numerical example to show the penalties of encoder 

2 It had long been known that the reliability function with respect to average block-length can improve [14], but there was a mistaken 
assertion by Pinsker in [15] that the fixed-delay exponents do not improve with feedback. 
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ignorance. It also demonstrates how the delay penalty continues to be substantial even in the non-asymptotic regime 
of short end-to-end delays and moderately small probability of error requirements. Section [TV] gives the proof for 
the fixed delay reliability function when both encoder and decoder have access to side-information. Section [V] 
proves the upper-bound on the fixed-delay reliability function when the encoder is ignorant of the side-information 
and the appendices show that it is tight for the symmetric case. Finally, Section [VTJ gives some concluding remarks 
by pointing out the parallels between the source and channel coding stories. 



II. Notation, problem setup and main results 

In this paper, all sources are iid random processes from finite alphabets where the finite alphabets are identified 
with the first few non-negative integers, x and y are random variables taking values in X and y, with x and y 
used to denote realizations of the random variables. Without loss of generality, assume that Vx S <-f,Vy € y, the 
marginals p x (x) > and p y (y) > 0. The basic problem formulation is illustrated in Figure [3] for the cases with or 
without encoder access to the side-information. 
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Fig. 3. Lossless source coding with encoder/decoder side-information. 



The goal is to losslessly communicate the source x, drawn from a joint distribution p xy on x,y, over a fixed 
rate bit-pipe. The decoder is always assumed to have access to the side-information y, and it may or may not be 
available to the encoder as well. 

Rather than being known entirely in advance, the source symbols enter the encoder in a real-time fashion. 
(Illustrated in Figure [4]) For convenience, time is counted in terms of source symbols: we assume that the source S 
generates a pair of source symbols (x,y) per second from the finite alphabet X xy. The j'th source symbol xj is 
not known at the encoder until time j and similarly for yj at the decoder (and possibly encoder). Rate R operation 
means that the encoder sends 1 binary bit to the decoder every seconds. Throughout the paper the focus is on 
cases with H x \ y < R < log 2 \X\, since the lossless coding problem becomes trivial outside of that range. 

Source xi X2 X3 X4 X5 xq ... 

I J J I I I 

Encoding M x i) *>3( x i) - 

Rate limited Channel 

1 

Side-info yi,y2, ... — > Decoding x i(4) £2 (5) ^3(6) ... 



Fig. 4. Time line for fixed-delay source coding with decoder side-information: rate R = 5, delay A = 3. 

Definition 1: A rate R encoder £ is a sequence of maps {£j},j = 1,2, The outputs of £j are the bits that 

are communicated from time j — 1 to j. When the encoder does not have access to the decoder side-information: 

3 \X X ) — U]Jj_x)R\ + X 
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When the encoder does have access to the decoder side-information: 

Ej : X j x y j — ► {o^iyURi-iu-mi 

Definition 2: A fixed delay A decoder V A is a sequence of maps {V A },j = 1, 2, The input to V A are the 

all the bits emitted by the encoder until time j + A as well as the side-information y{ +A . The output is an estimate 
xj for the source symbol xj. 

Alternatively, a family of decoders indexed by different delays can be considered together. For these, the output 
is a list x(j) = (xi(j),... ,%(j))- 

Vf : {0,1} L^J xyi — >X 
Vf(b[^,y{) =%-A(i) 

where Xj_a(j) is the estimate of Xj_a at time j and thus has an end-to-end delay of A seconds. 

The problem of lossless source-coding is considered by examining the asymptotic tradeoff between delay and 
the probability of symbol error: 

Definition 3: A family (indexed by delay A) of rate R sequential source codes {(£ A ,V A } achieves fixed-delay 
reliability E(R) if for all e > 0, there exists K < oo, s.t. Vi, A > 

Pr(xi + %{i + A)) < K2~ A( - E ^-^ 

when encoder £ A is used to do the encoding of the source and T> A } is the decoder used to recover x. 

It is important to see that all source positions i require equal protection in terms of probability of error, but the 
probability of error can never be made uniform over the source realizations themselves since it is the source that 
is the main source of randomness in the problem! 



A. Review of block source coding with side information 

Before stating the new results, it is useful to review the classical fixed-block-length coding results. In fixed- 
block-length coding, the encoder has access to xf all at once (as well as yf if it has access to the side-information) 
and produces nR bits all at once. These bits go to the block decoder along with the side-information y" and the 
decoder then produces estimates x™ all at once. While the usual error probability considered is the block error 
probability Pr(x™ / xf) = Pr(x™ / V n {£ n {x™))), there is no difference between the symbol error probability and 
the block-error probability on an exponential scale. 

The relevant error exponents E{R) are considered in the limit of large block-lengths, rather than end-to-end 
delays. E(R) is achievable if 3 a family of {(£ n ,T> n )}, s.t. 

lim -- log 2 Pr(xf + S$) = E(R) (1) 
The relevant results of [7], [6] are summarized into the following theorem. 

Theorem 1: If the block-encoder does not have access to the side-information, the best possible block-error 
exponent is sandwiched between two bounds: E l sib {R) < E si ^(R) < E^ ib (R) where 

E l slb (R) = mm{D(q xy \\ Pxy )+m a x{0,R-H(q xly )}} (2) 

Qxy 

= sup pR-Eo(p) (3) 

0<p<l 

Ki,b(R) = J™* ^xylM (4) 

q X y-H(q x \ y )>R 

= sup pR-E (p) (5) 

where 

E (p) = log 2 ^(^ Pxy (x,y)TT7)(i+P) (6) 

y ^ 

is the Gallager function for the source with side-information. 
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The lower-bound corresponds to the performance of random-binning with MAP decoding. The upper and lower 
bounds agree for rates close to H(p x \ y ), specifically R < dE Qp \p=i- 

If the encoder also has access to the side-information, then E^ i b (R) is the true error exponent at all rates since 
it can be achieved by simply encoding the conditional type of x™ given and then encoding the index of the true 
realization within that conditional type. 

If there is no side-information, then y = and the problem behaves like the case of side-information known 
at the encoder. (§]) recovers the simple point-to-point fixed-block-length error exponent for lossless source coding. 
The resulting random and non-random error exponents are: 

Kb(R,Px) = mm{D(q x \\ Px )+m a x{0,R-H(q x )}} (7) 

E s ,b(R,Px) = min D(q x \\p x ). (8) 

q x :H(q x )>R 

The Gallager function Q in © and © also simplifies to 

E (p) = (l + p) log 2 (^p x (x)^). (9) 

X 



B. Main results 

[17] shows that the random coding bound E l si b (R) is achievable with respect to end-to-end delay even without 
the encoder having access to the side-information. Thus, the factor of two increase in delay caused by using a 
fixed-block-length code in a real-time context is unnecessary. [17] uses a randomized sequential binning strategy 
with either MAP decoding or a universal decoding scheme that works for any iid source. [18] shows that the same 
asymptotic tradeoff with delay is achievable using a more computationally friendly stack-based decoding algorithm 
if the underlying joint distribution is known. However, it turns out that the end-to-end delay performance can be 
much better if the encoder has access to the side-information. 

Theorem 2: For fixed rate R lossless source-coding of an iid source with side-information present at both the 
receiver and encoder, the asymptotic error exponent E e i(R) with fixed end-to-end delay is given by the source 
uncertainty-focusing bound: 

E ei (R) = inf -El b ((a + 1)R) (10) 
a>o a ' 

where E^ b is defined in (0]) and (|5]). The source uncertainty-focusing bound can also be expressed parametrically 
in terms of the Gallager function Eq(p) from ©: 

E ei (R) = E Q (p) 

R = ^ (11) 

P 

This bound generically approaches R = H(x\y) at strictly positive slope of 2H(x\y)/ d g°2°^ ■ When 9 = 0, 
the fixed-delay reliability function jumps discontinuously from zero to infinity. 

Furthermore, this bound is asymptotically achievable by using universal fixed-to-variable block codes whose 
resulting data bits are smoothed to fixed-rate R through a FIFO queue with an infinite buffer size. This code is 
universal over iid sources as well as end-to-end delays that are sufficiently long (the block-length for the code is 
much smaller than the asymptotically large end-to-end delay constraint). 

Theorem 3: For fixed rate R lossless source-coding of an iid source with side-information only at the receiver, 
the asymptotic error exponent E s i{R) with fixed end-to-end delay must satisfy E s i{R) < E^R), where 

E u si {R)= min{ inf -D(q xy \\p xy ), 

i?f s l -^D{q x \\p x ) + D{q xy \\p xy )} (12) 

q xy ,l>a>0:H{q xW )>(l+a)R a ' 

For symmetric cases (such as those depicted in Figure [5), we have the following corollary: 
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Fig. 5. A joint distribution on x,y that comes from a discrete memoryless channel connecting the two together and where the y is uniform 
and independent of the channel. 



Corollary 1: Consider iid (x, y) ~ p xy such that the side-information y is uniform on y and x = y © s, 
where s ~ p s is independent of y. Then the asymptotic error exponent E s i{R) with fixed delay must satisfy 
E si (R) < E u si b {R) = E Sjb (R, Ps ) from © and ©. 

Since [17] shows that E^ ib (R) is achievable at low rates, Corollary Q] is tight there. 

III. Application and numeric example 

While the above results are general, they can be applied to the specific context of the [12] approach of compressing 
encrypted data. The general problem is depicted in Figure [6] in terms of joint encryption and compression. The 
goal is to communicate from end-to-end using a reliable fixed-rate bit-pipe in such a way that: 

• The required rate of the bit-pipe is low. 

• The probability of error is low for each source symbol. 

• The end-to-end delay is small. 

• Nothing is revealed to an eavesdropper that has access to the bitstream. 

The idea is to find a good tradeoff among the first three while preserving the fourth. To support these goals, assume 
access to an infinite supply of common-randomness shared among the encoder and decoder that is not available to 
the eavesdropper. This can be used as a secret key. We are not concerned here with the size of the secret key. 

This section evaluates the fixed-delay performance for both the compression-first and encryption-first systems as 
a way of showing the delay price of the encoder's ignorance of the side-information in the encryption-first approach. 
Nonasymptotic behavior is explored using a specific code for short short values of end-to-end delay to verify that 
the price of ignorance also hits when delays are small. 

A. Encryption/compression of streaming data: asymptotic results 

The main results of this paper can be used to evaluate two candidate architectures: the traditional compression-first 
approach depicted in Figure [T] and the novel encryption-first approach proposed in [12] and depicted in Figure [2] 

Source s\ S2 S3 S4 S5 sq ... 

I I I I I I 

Compression/Encryption b\(s^) ^>2( s i) ^3( s i) ••• 

Rate limited channel 

Decompression/Decryption s i(4) S2(5) S3 (6) ... 

Fig. 6. Joint encryption and compression of streaming data with a fixed end-to-end delay constraint. Here the rate R = | bits per source 
symbol and delay A = 3. 
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1) Compress and then encrypt: The traditional compression-first approach is immediately covered by Theorem |2] 
since the lack of side-information as far as compression is concerned can be modeled by having trivial side- 
information Y = and x = s. In that case, the relevant error exponent with end-to-end delay is given by (ITTb . The 
secret key can simply be used to XOR the rate R bitstream with a one-time -pad. 
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Fig. 7. The traditional approach of compression followed by encryption, for fixed-delay encoding at rate R 



delay A 



2) Encrypt and then compress: For the new approach of [12], the secret key is used at a rate of log 2 \S\ bits per 
source-symbol to generate iid uniform virtual side-information random variables y on the alphabet y = X = S. 
The virtual source is generated by x = s y where the + operation is interpreted in the Abelian group modulo 
|«S|. It is clear from [13] that the mutual information J(s";x") = for all n and since there is a Markov chain 
s — > x — > b to the encoded data bits, the eavesdropper learns nothing about the source symbols. Given knowledge 
of the secret key y, decoding x correctly is equivalent to decoding s correctly. Thus, the conditional entropy 
H(x\y) = H{x y|y) = H(s\y) = H(s) so nothing is lost in terms of compressibility. 
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Fig. 8. Fixed-delay compression of already encrypted streaming data at rate K = \ and delay A 



Meanwhile, the marginal distributions for both the encrypted data x and the secret key y are uniform. Since the 
conditions for Corollary Q] hold, the upper bound on the error exponent with delay is E^-AR) from (O and ([5]>. 
[17] guarantees that a sequential random binning strategy can achieve the exponent in Q and ([3]). 

This means that nothing higher than the fixed-block-length error exponent for source coding can be achieved 
with respect to end-to-end delay if the encryption-first architecture is adopted with the requirement that nothing 
about the true source be revealed to the compressor. As the next section illustrates by example, there is a severe 
delay price to requiring the encoder to be ignorant of the source. 
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In practical terms, this means that if both the end-to-end delay and acceptable probability of symbol error are 
constrained by the application, then the approach of encryption followed by compression can end up requiring 
higher-rate bit-pipes. 

B. Numeric example including nonasymptotic results 

Consider a simple source s with alphabet size 3, S = {A,B,C} and distribution 

p s (A) = a Ps (B) = ^ p,(cr) = l^ 

where a = 0.65 for the plots and numeric comparisons. 

1 ) Asymptotic error exponents: The different error exponents for fixed-block-length and fixed-delay source coding 
predict the asymptotic performance of different source coding systems when the end-to-end delay is long. We plot 
the source uncertainty-focusing bound E e i(R), the fixed-block-length error exponent E s ^(R,p s ) and the random 
coding bound E r sb (R,p s ) in Figure [9] For this source, the random coding and fixed-block-length error exponents 
are the same for R < dE °( p > \ p=1 = 1.509. Theorem Q] and Theorem [2] reveal that these error exponents govern the 
asymptotic performance of fixed-delay systems with and without encoder side-information. 




Fig. 9. Different source coding error exponents: fixed-delay error exponent E a (R) with encoder side-information, fixed-block-length error 
exponent E 3t b(R,Ps), and the random coding bound E r s b (R,p s ). The fixed-block-length bound also bounds the fixed-delay case without 
encoder side-information since the example here is symmetric. 

Figure \10\ plots the ratio of the source uncertainty-focusing bound over the fixed-block-length error exponent. 
The ratio tells asymptotically how many times longer the delay must be for the system built around an encoder 
that does not have access to the side-information. The smallest ratio is around 52 at a rate around 1.45. 

2) Non-asymptotic results: The price of ignorance is so high, that even non-optimal codes with encoder side- 
information can outperform optimal codes without it. This section uses a very simple fixed-delay coding scheme 
using a prefix-free fixed-to-variable code[19] instead of the asymptotically optimal universal code described in 
Theorem |2] The input block-length is two, and the encoder uses the side-information to recover s before encoding 
it as: 

AA -> 

AB -» 1000 AC -> 1001 BA -» 1010 BB -» 1011 
BC^llOO CL4->1101 OB -> 1110 CC ^ 1111 

For ease of analysis, the system is run at R = | < dE jfy > < \ p =i = 1.509. This means that the source generates 
1 symbol per second and 3 bits are sent through the error-free bit-pipe every 2 seconds. The variable -rate of the 
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Fig. 10. Ratio of the fixed-delay error exponent with encoder side-information E e i(R) over the fixed-block-length error exponent E 3t b(R,Ps)- 
This reflects the asymptotic factor increase in end-to-end delay required to compensate for the encoder being ignorant of the side-information 
available at the decoder. 



code is smoothed through a FIFO queue with an infinite buffer in a manner similar to the buffer-overflow problem 
studied in [20], [21]. The entire coding system is illustrated in Figure [TT] 

It is convenient to examine time in increments of two seconds. The length of the codeword generated is either 1 
or 4. The buffer is drained out by 3 bits per 2 seconds. Let be the number of bits in the buffer as at time 2k. 
Every two seconds, the number of bits in the buffer either goes down by 2 if S2fc-i> &2k = AA or goes up by 1 
if S2fc-i s 2fe 7^ AA. If the queue is empty, the encoder can send arbitrary bits through the bit-pipe without causing 
confusion at the decoder because the decoder knows that the source only generates 1 source symbol per second 
and that it is caught up. 
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Fig. 11. Suboptimal prefix coding system in action. / indicates empty queue, * indicates meaningless filler bits. 



Clearly k = 1,2, ... forms a Markov chain with following transition matrix: = L^-i + 1 with probability 
1 — a 2 , Lj- = Lfc_! — 2 with probability a 2 . The state transition graph is illustrated in Figure [12] For this Markov 



10 



chain, the stationary distribution can be readily calculated^ [22]. 

-1 + 



TTfc = Z( 
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4(1-9) 



(13) 



Where q = a 2 and Z 

is ge( 

a = 0.65 and thus q = a 



1 



-i- 



V 



"2 2 — is the normalization constant. For this example Z = 0.228. Notice that 

ir k is geometric and the stationary distribution exists as long as 4^^ < 8 or equivalently q > |. In this example, 

— 0.4225 > o, so the stationary distribution 7rfc exists. 




Fig. 12. Transition graph of a reflecting random walk Lk for queue length given the specified prefix-free code and the ternary source 
distribution {a, ^-^} and fixed rate | bits per source symbol, q — a 2 denotes the probability that Lk decrements by 2. 



Assume A is odd for convenience. For the above simple coding system, a decoding error can only happen if at 
time 2k — 1 + A, at least one bit of the codeword describing s 2 fc_i,s 2/ fc is still in the queue. Since the queue is 
FIFO, this implies that there were too many bits awaiting transmission at time 2k itself — ie that the number of 
bits Lj, in the buffer at time 2k, is larger than 

L-(A-1)J -Z(^ fc _i,S2fc) 

where l(s2k-i, S2fc) is the length of the codeword for s 2 fc_i, s 2 fc. I is 1 with probability q = a 2 and 4 with probability 
1 — q = 1 — a 2 . Notice that the length of the codeword for S2fc-i, S2k is independent of Lk since the source symbols 
are iid. This gives the following upper bound on the error probability of decoding with delay A when the system 
is in steady state@: 

Pr(s 2fc (2A; - 1 + A) + s 2k ) = 
Pr(s 2fc _ 1 (2A; - 1 + A) + s^) < Pr(/(s 2fe _ 1 , s 2k ) = 1) Pr(L fe > L^(A - 1)J - 1) 

Pr(/(s 2fe _i, S2k) = 4) Pr(L fc > L^(A - 1)J - 4) 

oo oo 

= <1 Yj 7r J + ( 1 -?) E ^3 

j = Lf(A-l)j j=L|(A-l)J-3 



G( 



Lf(A-l)j-3 



where G is the normalization constant 



2 _i + A / 1 + Miz£l 



g = z{ q Y J { — v — 2 — —y + a " ?))• 

For this example, G = 0.360. Thus, the fixed-delay error exponent for this coding system is 



il°g 2 ( — )" 



3 The polynomial corresponding to the recurrence relation for the stationary distribution has three roots. One of them is 1 and the other is 
unstable since it has magnitude larger than 1. That leaves only one possibility for the stationary distribution. 

4 If the system is initialized to start in the zero state, then this bound remains valid since the system approaches steady state from below. 
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Figure [13] compares three different coding schemes in the non-asymptotic regime of short delays and moderate 
probabilities of error at rate |. As shown in Figure |9j at this rate the random coding error exponent E r sb (R,p s ) is the 
same as the fixed-block-length error exponent E s ^(R,p s ). The block coding curve plotted is for an optimal coding 
scheme in which the encoder first buffers up y symbols, encodes them into a length ^.R-length binary sequence 
and uses the next y seconds to transmit the message. This coding scheme gives an error exponent Ea - b ^' p ^ with 
delay in the limit of long delays. 




i 1 i i | | | | 

10 20 30 40 50 60 

End to end delay 



Fig. 13. Error probability vs delay (non-asymptotic results) illustrating the price of encoder ignorance. 

The slope of these curves in Figure [13] indicates the error exponent governing how fast the error probability goes 
to zero with delay. Although smaller than the delay optimal error exponent E S (R), this simple coding strategy has 
a much higher fixed-delay error exponent than both sequential random coding and optimal simplex block coding. A 
simple calculation reveals that in order to get a 10~ 6 symbol error probability, the delay requirement for our simple 
scheme is ~ 40, for causal random coding is around ~ 303, and for optimal block coding is around ~ 374. Thus, 
the price of encoder ignorance is very significant even in the non-asymptotic regime and fixed-block-length codes 
are very suboptimal from an end-to-end delay point of view. 

IV. Encoders with side-Information 
The goal of this section is to prove Theorem [2] directly. 

A. Achievability 

The achievability of E e i (R) is shown using a simple fixed-to-variable@ length universal code that has its output 
rate smoothed through a FIFO queue. Because the end-to-end delay experienced by a symbol is dominated by the 
time spent waiting in the queue, and the queue is drained at a deterministic rate, the end-to-end delay experienced by 
a symbol is essentially proportional to the length of the queue when that symbol arrives. Thus on the achievability 
side, Theorem [2] can be viewed as a corollary to results on the buffer-overflow exponent for fixed-to- variable length 
codes. The buffer-overflow exponent was first derived in [20] for cases without any side-information at all. Here, 
we simply state the coding strategy used and leave the detailed analysis for Appendix [I] 

The strategy only depends on the size of the source alphabets \X\, \y\, not on the distribution of the source. 

We ran a linear regression on the data j/a = log 10 P e (A), ia = A as shown in Figure [9] from A = 80 to A = 100 to extrapolate the 
A, s.t. log 10 P e (A) = -6. 

6 Fixed-to-variable was chosen for ease of analysis. It is likely that variable-to-fixed and variable-to-variable length codes can also be used 
as the basis for an optimal fixed-delay source coding system. 
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Streaming side information Y 



Streaming data X 



Fixed to variable length Encoder 



FIFO 
Encoder Buffer 


Rate R bit stream 






Decoder 





Fig. 14. A universal fixed-delay lossless source coding system built around a fixed-to-variable block-length code. 



First, a finite block-length TV is chosen that is much smaller than the asymptotically large target end-to-end delay 
A. For a discrete memoryless source x, side information y and large block-length TV, an optimal fixed-to-variable 
code is given in [7] and consists of three stages: 

1) Start with a 1. 

2) Describe the joint type of the block x\ (the i'th block of length TV) and y*j. This costs at most a fixed 
1 + \X\ \y\ log 2 TV bits per block. 

3) Describe which particular realization has occurred for 5q by using a variable NH{xi\yi) bits where H(xi\yi) 
is the empirical conditional entropy of sequence x\ given y,. 

This code is obviously prefix-free. When the queue is empty, the fixed-rate R encoder can send a without 
introducing any ambiguity. The total end-to-end delay experienced by any individual source-symbol is then upper- 
bounded by TV (how long it must wait to be assembled into a block) plus times the length of the queue once it 
has been encoded. 

Write l(xi,yi) as the random total length of the codeword for iq, Then 

NHixtfi) < l(x u y % ) = N(H(xi\yi) + e N ) (14) 

where ej\r < — ! — N 2 goes to as TV gets large. 

Because the source is iid, the lengths of the blocks are also iid. Each one has a length whose distribution can be 
bounded using Theorem Q] From there, there are two paths to show the desired result. One path uses Corollary 6.1 
of [5] and for that, all that is required is a lemma parallel to Lemma 7.1 of [5] asserting that the length of the 
block has a distribution upperbounded by a constant plus a geometric random variable. Such a bound easily follows 
from the (0) formulation for the block-reliability function. We take a second approach proceeding directly using 
standard large deviations techniques. The following lemma bounds the probability of atypical source behavior for 
the sum of lengths. 

Lemma 1: for all e > 0, there exists a block length TV large enough so that there exists K < oo such that for 
all n > and all H(x\y) < r < log 2 \X\ 

n 

Pr(J^i(3,y«) > nTVr) < K2~ nN ( E "^~ e l (15) 

i=l 

Proof: : See Appendix HJ 

At time (t + A) TV, the decoder cannot decode x t with error probability iff the binary strings describing x t are 
not all out of the buffer yet. Since the encoding buffer is FIFO, this means that the number of outgoing bits from 
some time t% to (t + A)TV is less than the number of the bits in the buffer at time t\ plus the number of incoming 
bits from time t% to time tN. Suppose the buffer were last empty at time t\ = tN — nN where < n < t. Given 
this, a decoding error could occur only if Y^=o K*t-i, 9t-i) > (n + A) NR. 
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Denote the longest code length by l max < 2 + \X\\y\ log 2 (iV + 1) + iVlog 2 \X\. Then Pr(J]™ =0 1 Ifa-i, > 
(n + A)NR) > only if n > {n+ ^ R > f£f = pA. So 

t n-1 

Pr(x t ^ x t ((t + A)JV)) < ^ Pr(^/(x t _ i ,y t _ i ) > (n + A)NR) (16) 

n=/3A i=0 

< {a) £ w-^w*^™ 

n=(3A 

oo ■yA 

< {b) K 2 2- nN ^^-^ + K 2 2- A7V(min °> l{ - 

n=jA n=(3A 
< (c) ^ 3 2-7A7V(^, 6 (R)-e 2 ) + _ p ^ K32 -AN(E Bi (R)-e 2 ) 

< {d) R2- AN ^ R ^ 

where the large K[s and arbitrarily tiny e^s are properly chosen real numbers, (a) is true because of Lemma [2 
Letting 7 = E\ b (R) ^ n tne ^ rst P Mi °^ W» we on ^ nee ^ tne f act tnat ^"i fcC^O i s non-decreasing with i?. In the 
second part of (6), let a = and choose the a to minimize the error exponents. The first term of (c) comes 
from the sum of a geometric series. The second term of (c) follows from the definition of E e i{R) in (fTOb . (d) 
follows from the definition of 7 above and by absorbing the linear term into the e in the exponent. ■ 



. (acR) 



B. Converse 

The idea is to bound the best possible error exponent with fixed delay, without making any assumptions on 
the implementation of the encoder and decoder beyond the fixed end-to-end delay constraint. In particular, no 
assumption is made that the encoder works by encoding source symbols in small groups and then uses a queue to 
smooth out the rate. Instead, an encoder/decoder pair is considered that uses the fixed-delay system to construct a 
fixed-block-length system. The block-coding bounds of [7] are thereby translated to the fixed delay context. The 
arguments are analogous to the "uncertainty-focusing bound" derivation in [5] for the case of channel coding with 
feedback and the techniques originate in the convolutional code literature [23]. 

Proof: For simplicity of exposition, we ignore integer effects arising from the finite nature of A and R etc. 
For every a > and delay A, consider a code running at fixed-rate till time — + A. By this time, the decoder 
has committed to estimates for the source symbols up to time i = — . The total number of bits generated by the 
encoder during this period is + A)R. 

Now, relax the causality constraint at the encoder by giving it access to the first i source symbols all at once at 
the beginning of time, rather than forcing the encoder to get the source symbols gradually. Simultaneously, loosen 
the deadlines at the decoder to only demand correct estimates for the first i source symbols by the time — + A. In 
effect, the deadline for decoding the past source symbols is extended to the deadline of the i-th symbol itself. 

Any lower-bound to the symbol error probability of the new problem is clearly also a bound for the original 
problem. The difference between block error probability and symbol error probability is at most a factor of i and 
is insignificant on the exponential scale. Furthermore, the new problem is just a fixed-block-length source coding 
problem requiring the encoding of i source symbols into + A)R bits. The rate per symbol is 

((- + A) J R)i = ((^ + A) J R)^ 
a i a A 

= (a + 1)12. 

Theorem 2.15 in [7] tells us that such a code has a probability of error that is at least exponential in iE e i^{{a + 
1)12). Since i = — , this translates into an error exponent of at most E "M^+ l ) R ) w j t ^ parameter A. 

Since this is true for all a > 0, we have the uncertainty-focusing bound on the reliability function E e i(R) with 
fixed delay A: 

Eei(R) < inf -E ei>b ((a + 1)12) (17) 
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The minimizing a tells how much of the past (^) is involved in the dominant error event. 

The source uncertainty-focusing bound can be expressed parametrically in terms of the Gallager function Eq(p) 
from ([6]) and its slope computed in the vicinity of the conditional entropy. This is shown in Appendix ITT] ■ 



V. NO SIDE-INFORMATION AT THE ENCODER 

This section proves the upper bound given by Theorem [3] for the fixed-delay error exponent for source coding 
without encoder side-information. This bound is valid for any generic joint distribution p xy . The results are 
specialized to the symmetric case in Corollary [T] proved in Appendix I VIII I 

In the following analysis, it is conceptually useful to factor the joint probability to treat the source as a random 
variable x and consider the side-information y as the output of a discrete memoryless channel (DMC) p y \ x with x 
as input. This model is shown in Figure [15] 



Encoder 




Decoder 









( DMC ) 

yi,Y2, - 



Fig. 15. Lossless source coding with side-information only at the decoder. 



The theorem is proved using a variation of the bounding technique used in [5] (and originating in [15]) for the 
fixed-delay channel coding problem. Lemmas |2]|7] are the source coding counterparts to Lemmas 4.1-4.4 in [5]. 
The idea of the proof is to assume a more powerful source decoder that has access to the previous source symbols 
(considered as feed-forward information) in addition to the encoded bits and the side-information. The second step 
is to construct a fixed-block-length source-coding scheme from the encoder and optimal feed-forward decoder. The 
third step is to show that if the side-information behaves atypically enough, then the decoding error probability 
will be large for many of the source symbols. The fourth step is to show that it is only future atypicality of the 
side-information that matters. This is because the feed-forward information allows the decoder to safely ignore all 
side-information concerning the source symbols that it already knows perfectly. The last step is to lower bound 
the probability of the atypical behavior and upper bound the error exponents. The proof spans the next several 
subsections. 

1 ) Feed-forward decoders : 

Definition 4: A delay A rate R decoder V A,R with feed-forward is a decoder V^' R that also has access to the 
past source symbols in addition to the encoded bits frL0'+A)-RJ an( j s ide4nformation yj +A . 
Using this feed-forward decoder, the estimate of xj at time j + A is : 

xM + A) = vf> R (b[ u+mi , //r A •<•! l ) (18) 



Causal 
encoder 



Feed-forward 
delay 



Delay A 
feed-forward 
decoder 1 



Delay A 
feed-forward 
decoder 2 



Fig. 16. A cutset illustration of the Markov Chain x" 
delay A rate R feed-forward decoders respectively. 



x™. Decoder 1 and decoder 2 are type I and II 
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Lemma 2: For any rate R encoder 6, the optimal delay A rate R decoder V A ' R with feed-forward only needs 
to depend on 5p+ A )^J ^ yj +A J x^ 1 

Proof: The source and side -information pair (x,,yj) is an iid random process and the encoded bits frLO'+^-RJ 
are causal functions of x{ +A . It is easy to see that the Markov chain y\~ l — (x^ 1 , b^ +A ' R \ , iA +A ) — x^ +A 
holds since 



Pr^r 1 , b M +A)Ri , yj +A ) Pr(^ +A |xr 1 , 6p +A)iiJ , yj +A ) Pr^Vf \ 6p +A)fiJ , yj +A ar 



= Pr^r 1 , 4°' +A)flJ , yj +A ) Pr(zf Vf 1 , &p +A ^ , y j +A ) Pr(^" Vi" 1 ) 

Thus, conditioned on the past source symbols, the past side-information is completely irrelevant for optimal MAP 
estimation of Xj. □ 

Write the error sequence of the feed-forward decoder by identifying the finite source alphabet 

with the appropriate finite group. Then we have the following property for the feed-forward decoders. 

Lemma 3: Given a rate R encoder £, the optimal delay A rate R decoder V A ' R with feed-forward for symbol 
j only needs to depend on b\ A ,y\ +A jX^ 1 

Proof: Proceed by induction. It holds for j = 1 since there are no prior source symbols. Suppose that it 
holds for all j < k and consider j = k. By the induction hypothesis, the action of all the prior decoders j can be 
simulated using (pi + , y] +A , giving This in turn allows the recovery of x\~ l since we also know 

x^ 1 . Thus the optimal feed-forward decoder can be expressed in this form. □ 

We call the feed-forward decoders in Lemmas [2] and [3] type I and II delay A rate R feed-forward decoders 
respectively. Lemmas [2] and [3] tell us that feed-forward decoders can be thought in three ways: having access to 
all encoded bits, all side information and all past source symbols, (b^ +A ' , y{ +A ,x{~ 1 ), having access to all 
encoded bits, a recent window of side information and all past source symbols, (6^ +A ^ , yj +A , or having 

access to all encoded bits, all side information and all past decoding errors, (b^ A ^ , y{ +A , x^ 1 ). 

2) Constructing a block code : To encode a block of n source symbols, just run the rate R encoder £ and 
terminate with the encoder run using some A random source symbols drawn according to the distribution of p x . To 
decode the block, just use the delay A rate R decoder V A,R with feed-forward, and then further use the fedforward 
error signals to correct any mistakes that might have occurred. As a block coding system, this hypothetical system 
never makes an error from end to end. As shown in Figure [161 the data processing inequality implies: 

Lemma 4: If n is the fixed block-length, and the block rate is R(l + — ), then 

H(S£) > -{n + A)R + nH(x\y) (19) 

Proof: : See Appendix [Hi] 

3) Lower bound the symbol-wise error probability : Now suppose this block-code were to be run with the 
distribution q xy , s.t. H(q x i y ) > (1 + ^)R, from time 1 to n, and were to be run with the distribution p xy from time 
n + 1 to n + A. Write the hybrid distribution as Q xy . Then the block coding scheme constructed in the previous 
section would with probability very close to 1 make a block error. Moreover, many individual symbols would also 
be in error often: 

Lemma 5: If the source and side-information is coming from q xy , then there exists a 5 > so that for n large 
enough, there exists a number n e and a sequence of symbol positions ji < J2 < ■ ■ ■ < jn e satisfying: 

' ne - 2lo g2 \X\-(H(q xW )-^R) n 

• The probability of symbol errors made by the feed-forward decoder on symbol Xj. is at least S when the joint 

source symbols are drawn according to q xy . 
. 5 satisfies h s + <&log 2 (|#| - 1) = \{H{q x \ y ) - ^R) where h s = -<51og 2 5 - (1 - $)log 2 (l - 6). 

Proof: See Appendix |IV| 



Pick j* = j?Lf. to pick a symbol position in the middle of the block that is subject to errors. Lemma [5] reveals 

2 21og 2 \X\~(H( q ^)~n±±R) i 



that min{j*,n - j*} > ^ 21oc i^f^fa ") R " +A R ') n, so if we fix n and let n §° to in ^ ni ^ then min{j*,n - j*} 



goes to infinity as well. 
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At this point, Lemma [2] implies that the decoder can ignore the side-information from the past. Define the "bad 
sequence" set Ej* as the set of source and side-information sequence pairs so the type I delay A rate R decoder 
makes a symbol error at j*. To simplify notation, let x = x{ +A , x = x{ _1 , x = x^, +A , y = yj» +A denote the 
entire source vector, the source prefix, and the suffixes for the source and side-information respectively. Define 
E r = {(x ^Vp R (S(x),yi' +A ,x)}. 

Since the "bad sequence" set Ej* only has future side-information in it, the probability of this set depends only 
on the marginals for x in the past and the joint distribution in the present and future. Consider a hybrid distribution 
where the joint source behaves according to Q xy from time j* to j* + A and the x source one behaves like it came 
from a distribution q x from time 1 to j* — 1. By Lemma [5J Q xy (Ej*) > 5. 

Define J = min{n, j* + A} to deal with possible edge-effect^j near the end of the block, and let x = x^,, 
y = yj,. The empirical distribution of (x,y) is written using shorthand r x ^(x,y) = ^sgfeg) an d similarly the 

empirical distribution of x as r x (x) = j . 
Now, the strongly typical set can be defined 

Aj(qxy) = { (x, y) € X j ' +A x y A+1 \\/x,r s (x) G (q x (x) - e,q x {x) + e) 

V(x,y) G X x y, q xy (x,y) > : r= =(x,y) G (q xy (x,y) - e,q xy (x,y) + e), 
V(x,y)eXxy, q xy (x,y)=0:r i§ {x,y)=0 }. (20) 

The conditions require that the prefix be g x -typical and the suffix till J be g xy -typical. What happens after J is not 
important. 

This typical set is used to get a sequence of lemmas asserting that errors are common even when we restrict to the 
typical behavior of the q distribution, that the probability of g-typical joint realizations is least exponentially small 
under the true distribution, and that this means that the errors themselves must occur at least with exponentially 
small probability. 

Lemma 6: Q xy (Ej, n Aj(q xy )) > | for large n and A. 
Proof: See Appendix IVl 

Lemma 7: For all e < mm X)y . pxy ( x y)>Q {p xy (x, y)}, V(x, y) G A e j(q xy ), 



Pxy(x,y) 



> 2 -( J -r + l)D(q xy \\p xy )-(j'-l)D(q x \\ Px )-JGe 



Qxy(x,y) 

where G = max^l^K Y, x , y : Pxy ( x , y)>0 k&CgjsS + ^ ^ + ^^(^ + 1)} 



Proof: See Appendix |VI| 

Lemma 8: For all e < mm Xty {p xy (x, y)}, and large A, n: 

Proof: See Appendix IVIII 

4) Final details in proving Theorem^ Notice that as long as H(q x \ y ) > R, we know 6 > by letting e go 
to 0, and having A and n (and thus also J) go to infinity proportionally. So Pr[x ) «(j* + A) ^ Xj*] = p xy (Ej*) > 
K2-( J -i* + V D ( q *y^-(i*~V D ( q *\\P*\ 

Notice that D(q xy \\p xy ) > D(q x \\p x ). Since J = min{n,j* + A}, for all possible j* G [l,n] we have for all 
n > A: 



(J - j* + l)D(q xy \\p xy ) + (j* - l)D(q x \\p x ) < (A + l)D(q xy \\p xy ) + (n - A - l)D{q x \\p x 

n - A 

Meanwhile, for n < A: 



A(D(q xy \\p xy ) + —^D(q x \\p x )). 



Ti 

(J-f + l)D(q xy \\p xy ) + (j* - l)D(q x \\p x ) < nD{q xy \\p xy ) = A{-D(q xy \\p xy )). 

7 These edge effects, although annoying, cannot be ignored since guaranteeing that S is small relative to n would come at the cost of less 
tight bounds in asymmetric cases. In this way, the situation is different from the argument given in [5] for channel coding without feedback. 
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Write a = f . The upper bound on the error exponent is the minimum of the above error exponents over all 
a > 0. 



E u sl {R)= mm { inf l^xylK)}, 

q xy ,a>l:H(q xly )>(l+a)R a 

I - a 

inf , , . { Px) + - D (<?xy||Pxy)}} 

g xr ,l>a>0:^(g x | y )>(l+Q)fl a 

This is the desired result. ■ 
The specialization of this result to uniform sources x and side information y = x © s is straightforward and is 
covered in Appendix IVIIII The key is to understand that when the joint source is symmetric, the marginal for q 
always agrees with the marginal for the original p. 



VI. Conclusions 

This paper has shown that fixed-block-length and fixed-delay lossless source-coding behave very differently 
when decoder side-information is either present or absent at the encoder. While fixed-block-length systems do not 
usually gain substantially in reliability with encoder access to the side-information, fixed-delay systems can achieve 
very substantial gains in reliability. This means that if an application has a target for both end-to-end latency and 
probability of symbol error, then depriving the encoder of access to the side-information will come at the cost of 
higher required data rates. 

The proof of achievability makes clear the connection to ideas of "effective bandwidth" and buffer-provisioning 
in the networking context (see e.g.[24]). The results here and in [5] can be considered a way to extend the spirit 
of those concepts to problems like source-coding without access to side-information and communication without 
feedback. Thinking about buffer-overflow is too narrow a perspective to generalize the idea of "how much extra 
rate is required beyond the minimum" but end-to-end delay provides a framework to understand this and thereby 
compare different approaches. 

Thus, it is useful to view this paper as a companion to its sister paper [5] (treating channel-coding with and 
without feedback) in the fixed-delay context. Comparing both sets of results shows how feedback in channel coding 
is very much like encoder access to decoder side-information in lossless source coding. The main difference is that 
source coding performance is generally better at high rates while channel coding is better at low rates. The subtle 
aspect of the analogy is that lossless source-coding with encoder side-information behaves like channel-coding with 
feedback for channels with strictly positive zero-error capacity. 

• For generic symmetric channels with Cqj > 0, the fixed-block-length reliability function is known perfectly 
with feedback and jumps abruptly to oo at Cqj and approaches zero quadratically at C. 

For generic sources, the fixed-block-length reliability function is known perfectly with encoder side-information 
and jumps abruptly to oo at log 2 \X\ and approaches zero quadratically at H x \ y . 

• For generic symmetric channels with Cqj > 0, the fixed-delay reliability with feedback tends smoothly to oo 
at Cqj and approaches zero linearly at C. 

For general sources with encoder access to side-information, the fixed-delay reliability function tends smoothly 
to oo at log 2 \X\ and approaches zero linearly at H x \ y . 

• For generic symmetric channels with Cqj > 0, an asymptotically optimal fixed-delay code with feedback can 
be constructed using a queue fed at fixed rate followed by a fixed-to-variable channel code. 

For generic sources with encoder access to side-information, an asymptotically optimal fixed-delay code can 
be constructed using a fixed-to-variable source code followed by a queue drained at fixed rate. 
In both cases, the non-ignorant encoders can help deliver substantially lower end-to-end delays. In addition, in 
both cases there is a gap between the achievable regions and converses for fixed delay reliability for asymmetric 
cases when considering ignorant encoders. In addition to closing this gap, many natural problems remain to be 
explored: joint source-channel coding [25], lossy coding [26], as well as extending the upper-bound techniques here 
to truly multi-terminal settings with distributed encoders. 
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Appendix I 
Proof of Lemma [T] 

In the large deviation theory literature, limit superior and limit inferior are widely used while calculating the 
asymptotic properties of the rate functions [27]. However it is sometimes more convenient to use the following 
equivalent e — K conditions since 

a < lim inf — log 2 P n < lim sup — log 2 P n < b 

n^oo n n-*oo U 

iff for all e > 0, there exists K < oo, such that for all n: K2 n ( a ~^ < P n < K2 n{b+e \ The equivalence is obvious 
from the definitions of limit superior and limit inferior [27]. 

Proof: By Cramer's theorem[27], for all e\ > there exists K\, such that: 

n n 

Pr(V l(x u yi) > nNr) = Prf-Y^fotf) > Nr) < K 1 2~ ni - in ^> N " I(z ^- e ^ (21) 

i=l i=l 

where the rate function I(z) is [27]: 

I(z) = sup{p^-log 2 ( P X y{x,y)2 pKm )} (22) 

pen (x,y)ex N xy N 



Write I(z,p) =pz- log 2 ( £ Pxy (x,y)2^) 

(x,y)£X N xy N 

Notice that the Holder inequality implies that for all pi,p2 an d for all 9 E (0, 1): 



9) P2 )h 



This shows that log 2 (X)(£ y)ex N xy N Pxyi^i y)2 pl( - x ^) is a convex U function of p and thus I(z,p) is a concave n 
function of p for fixed z. Clearly l(z,0) = 0. Consider z > Nr > NH(x\y). For large N, 

dl(z,p) s—^ . ,_, 

— ^ \p=o = z- 2^ Pxy{x,y)l{x,y) > 

since the average codeword length is essentially NH(x\y). Thus I(z,p) < as long as z > Nr and Vp < 0. This 
means that the p to maximize /(z, p) is positive. So from now on, it is safe to assume p > 0. This implies that 
I(z) is monotonically increasing with z and it is obvious that I(z) is continuous. Thus 



inf I(z) = I(Nr). (23) 

z>Nr 

Using the upper bound on l{x,y) in (fT4l) : 

log 2 ( J] Pxy (x,y)2P 1 ^) < log 2 ( X 2 -AfDfe y ||^) 2 p(e N +7V//fe |y ))) 

< 2^ e N2 _ ^ min ' ! ''-' D '-' 3xy '' Pxr -'~ /;) ^^' 3x|r '' _ ' oejv -'') 

= iV( - mm{D(q xy \\p xy ) - pH{q x \ y ) - pe N } + e N ) 

where < < 2+ \ x ^ l ° s ^( N + 1 ) g 0es to as A goes to infinity and T N is the set of all joint types of X N x y N . 
Substitute the above inequalities into I(Nr) defined in (1221) : 

I(Nr) > A(sup{mmp(r - H(q x \ y ) - e N ) + D(q xy \\p xy )} - e N ) (24) 
p>0 i 
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The next task is to show that I(Nr) > N(E ei ^(r) + e) where e goes to as N goes to infinity. This can be 
proved by the tedious but direct Lagrange multiplier method used in [17]. Instead, the proof here is based on the 
existence of a saddle point. Define 

f(q, P) = Pi r - H (Qx\ y ) ~ ejv) + D(q xy \\p xy ). 

Clearly for fixed q, f(q,p) is a linear function of p, and thus concave. In addition, for fixed p > 0, f{q,p) 
is a convex U function of q, because both — H{q x \ y ) and D(q xy \\p xy ) are convex U on q xy . Define g(u) = 
miriq sup p>0 (/(g, p) + pu). It is enough to show that g(u) is finite in the neighborhood of u = to establish 
the existence of the saddle point [28]. 

g(u) =i a) minsup/(g,p) + pu 
i p>0 

=i b) minsuppfV - H(q x \ y ) - e N + u) + D(q xy \\p xy ) 
1 p>0 

<(c) , min supp(r - H(q x \ y ) - e N + u) + D(q xy \\p xy ) 

q:H(q xW )>r-e N +u p >Q 

-W u , ^ D (lxy\\Pxy) 

q:H{q x \ y )>r-t N +u 

<(e) 00 (25) 

(a), (b) are from the definitions, (c) is true because H(p x \ y ) < r < log 2 \X\ and thus for very small and u, 
H(p x \y) < r — 6n + u < log 2 \X\. Consequently, there exists a distribution q so that H{q x \ y ) > r — ejv + u. (d) 
holds because H{q x \ y ) > r — + u and p > 0. (e) is true because we assumed without loss of generality that 
the marginal p x (x) > for all x G X together with the fact that r — + u < log 2 1^1- The finiteness implies the 
existence of the saddle point of f(q,p). 

sup{min f(q, p)} = min{sup f(q, p)} (26) 
p>0 <? 1 p>0 

Note that if H(q x \ y ) < r + e^v, then p can be chosen to be arbitrarily large to make p(r — H{q x \ y ) — ejv) + 
D(q xy \\p xy ) arbitrarily large. Thus the q to minimize sup p p{r — H(q x \ y ) — e^) + D(q xy \\p xy ) satisfies r — H(q x \ y ) — 
6n > 0. Thus 

min{supp(r - H(q x \ y ) - e N ) + D{q xy \\p xy )} = {a) min sup{p(r - H(q x{y ) - e N ) + D(q xy \\p xy )} 

1 p>0 q:H(q xly )>r~e N p >o 

= W ul m ^ WiQxyWPxy)} 

q:H(q K [y)>r-e N 

=( c ) Eei,b(r-e N ). (27) 

(a) follows from the argument above. (6) is true because r — H{q x \ y ) — €n < and p > and thus p = 
maximizes p(r — H(q x \ y ) — ejv)- (c) is true by definition. Combining (1241 ) (|26l ) and ([27]), letting iV be sufficiently 
big implies that e^v is sufficiently small. Noticing that E e i{r) is continuous on r, we get the the desired bound in 
(E]). ^ □ 



Appendix II 
Parametrization of E ei (R) 

We need the definition of tilted distributions for a joint distribution p xy from [17]. 
Definition 5: x — y tilted distribution of p xy : p xy , for all p G [—1, +oo) 

p xy ( X ,y) = Jp^ i8 \ y) ^t!' x 



EtE, M s > i) E s Pxy(s, V) >+> 

(28) 

Write the conditional entropy of x given y for this tilted distribution as H(p x ^). An important fact as shown in 

Lemma 17 of [17] is that = H(p x{y ), also ff(p£ |y )|p= = H(p x \ y ), H(jf x]y )\ p=+00 = log 2 (M( Pxy )). where 

M(p xy ) = | max^yfi G <Y : p xy (x,y) > 0} 
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We first show that is in general monotonically increasing for p € [0, oo): 

dp {a) p 2 

pH(p p xly )-E (p) 

=0) pi 



D(Pxy\\Pxy) 
-(c) ? 

>(d) o 

(a) is obvious and (b) is from Lemma 17 in [17]. (c) is from Lemma 15 in [17]. (d) is true unless the source x is 
conditionally uniform given side information y. For the trivial case conditionally uniform case where p x \ y (x\y) = 
M ^ p t on those letters x for which it is nonzero, both the fixed-block-length error exponent E^ b (R) and the delay 
error exponent E e i(R) are either when R < log 2 (M(p xy )) or oo when R > log 2 (M(p xy )). 

With the above observations, we know that for all R € [H(p x \ y ), log 2 M{p xy )), there exists a unique p* > 0,s.t. 
R = e °(p" ) or equivalently p*R = E (p*). In order to show 4TT]>, it remains to show that E ei (R) = E (p*). 

From the definition of E e i{R) in (ITOl ) and the definition of E^ i b (R) in ([5]), we have: 

E ei (R) = M-E^ b ((a + l)R) 

. , p(a + l)R-E (p) ^ 
= ml \ sup — — \ 

> sup ml pit H 

P >o Q>0 a 

> inf^+^-W 



a>0 a 
= p*7?. (29) 

Now show that E e i(R) < p*R by writing p(-R) as the parameter p that maximizes pR — Eo(p). From the 
convexity of Eq(p) for p € [0, oo) and the fact that R € \H(p x \ y ), log 2 M(p xy )), we know that is the unique 

positive real number s.t. 7? = dE Qp P ^ \ p=p (r) = H{p p x ^)\ p=p ^ R y pR — Eq{p) is a concave n function of p and 
pR — Eo(p)\p=o = 0, hence p(R) < p* where p(R) is the maximal point and p* is the zero point of pR — Eq(p). 
This is illustrated in Figure [TTl 

Because R G [H(p x \ y ), log 2 M(p xy )) and p(7?) < p*, there exists 7?' > 7?, s.t. p* = p(R'), i.e. p* maximizes 
pR' — Eq(p). Now let a* = — 1 which is positive because 7?' > 7?. That is maximizes pR' — E${p) = 
p(l + a*)R — Eq{p). Plugging this in, gives: 

Eei(R) = ml { sup | 

p(l + a*)7?-£;o(p) 
< sup 

p>o a* 

p*(l + a*)7? - £ (p*) 



a 

= p*R = E (p*). (30) 

Finally, to get the slope in the vicinity of the conditional entropy, just expand E${p) around p = using a Taylor 
series. The constant term is zero and Lemma 17 of [17] reveals that the first order term is the conditional entropy 
itself. The slope jf-/Qp evaluated at p = is clearly the first-order term in the Taylor series divided by the second 
order term, giving the desired result. The second derivative of E (p) is only zero when D(jP xy \\p xy ) = which 
implies that p xy is itself conditionally uniform, resulting in the claimed infinite error exponents. 
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Fig. 17. Plot of pR — Eq(p). The maximizing p(R) < p* , the point where this crosses zero. 7? is fixed. 



Appendix III 
Proof of Lemma H] 



nH(x) 



< 



~-(a) 
Z (b) 

(d) 
< 



ff(xf) 
7(x?;x?) 

I(x?;x?,b[ {n+A)Ri ,yr A ) 

I(x?;y?+ A ) + /(x 1 ™;x 1 >f+ A ) + I(x?; b[ (n+A)Rl \y? +A ,x?) 
nl(x, y) + H{x?) + H(b[ (n+A)Ri ) 
nH(x) - nH(x\y) + H{x^) + (n + A)R 



(a) is true because the source is iid. (b) is true because of the data processing inequality considering the following 

( ^L(«+A)«] jyr) ^ - ' 

\(n+A)R] , n+A^ 



Markov chain: x" 



xf, thus I(xf;xf) < 7(x 1 n ;x 1 n ,4 (n+A)flJ ,yr +A )- Furthermore, 



I(x{ 1 ; xf) = -ff (xf) > 7(xf ; xf, ^"-t-^^j ^ y n+^^ Combining the two inequalities gives (6). (c) is the chain rule for 



mutual information. In (d), first notice that (x,y) are iid across time and thus I{x™;y™ + ) = I(xf;yf ) = n/(x,y). 
Second, the entropy of a random variable is never less than the mutual information of that random variable with 
another one, conditioned on another random variable or not. □ 



Appendix IV 
Proof of Lemma [5] 



Lemma @] implies: 



J2 H (*i) > H(x?)>-(n + A)R + nH(q x{y ) 



(31) 
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The average entropy per source symbol for x is at least H(q x \ y ) — ^^R. Now suppose that H(>q) > \{H{q x \ y ) 
rk i^-R) for n e symbol positions 1 < ji < ji < ... < jj± < n. By noticing that H(xi) < log 2 \X\, we have 

- 1 n + A 
^jH(i<i) < n e log 2 \X\ + (n - n e )-(H(q x \ y ) — R) 

i=l 



Combining this with (1311 ) gives: 

n e > V V xlyJ 2 '-rr n (32) 

e ~ 2log 2 \X\-(H(q x]y )-^R) 

where 21og 2 \X\ - (H(q x{y ) - *±±R) > 21og 2 \X\ - H(q x{y ) > 21og 2 \X\ - log 2 \X\ > 0. 

For each of the j, the individual entropy H(xj) > ^(H(q x \ y ) — ^^-R). By the monotonicity of the binary 
entropy function, Pr(xj ^ X q) = Pr(xj ^ Xj) > 5. □ 



Appendix V 
Proof of Lemma [6] 

If we fix — and let n go to infinity, then by definition J = min{n, j* + A} goes to infinity as well. By 
Lemma 13.6.1 in [19], it is known that Ve > 0, since J —j* and j* both getting large with n, that Q xy (Aj(q xy ) c ) < 
f. By Lemma S Q xy {E r ) > 5. So 



Q xy (E r n A^(q xy )) > Q xy (E r ) - Q xy (A e j(q 



xy) 



> 



□ 



Appendix VI 
Proof of Lemma [7] 

For (x,y) € Aj(q xy ), by the definition of the strongly typical set, it can be easily shown by algebra that 

D ( r S,§\\Pxy) < D{q xy \\p xy ) + Ge and D(r s \\p x ) < D(q x \\p x ) + Ge. So 



Pxy(x,y) 
Qxy(x, y) 



> 



(<•) 



where (a) is true by (12.60) in [19]. 



Pxy(x)p xy {x,x)Pxy{x J J+1 , # +1 ) 
qxy(x) q xy (x, y) Pxy (4* + + A , y£+ A ) 

2 -{J-r+l){D(r^ t \\ Pxy )+H{r i .)) 2 -(j* -l)(U(r« |b0+ff(r«)) 

2-(J-j*+l)(0(r ilJ || 9)y )+ff(r i , fl )) 2-0'*-l)(£>(r i || 9 »)+«(r i )) 
2 -(J-j-+l)(D(g xy ||p xr )+Ge)-(j--l)(D(g x ||p x )+G e ) 

9-(J-i*+l)£>(^||Pv)-0'*-l)^(9x||Px)-JGe 



□ 



Appendix VII 
Proof of Lemma [8] 

Combining Lemmas [6] and [TJ 

Pxy(^j-) > Pxy(^'* n A e j(q xy )) 

> q xy (E r n ^ J ( gxy )) 2 -( J -J*+ 1 ) D (^IK)-(i*-i)^fel|P,)-JG e 



□ 
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Appendix VIII 
Proof of Corollary [I] 

Theorem [3] asserts that 

E si (R) < { inf i^D(q xy \\p xy )}, 

q xy ,a>l:H(q xW )>(l+a)R a 

I - a 

inf / { D(q x \\p x ) + D(q xy \\p xy )}} 

q xy ,l>a>0:H{q xW )>(l+a)R a 

I - a 

< inf { D(q x \\p x ) + D(q xy \\p xy )} 

q xy ,l>a>0:H(q xly )>(l+a)R a 

~ i>>nm 'v!(u \r {^-^D(q x \\p x ) + D(q xy \\p xy )} 

q xr ,l>a>0:H(q x iy)>(l+a)R, q x =p x a 

= inf {D(q xy \\p xv )} 

q xy ,l>a>0:H(q xly )>(l+a)R, q x =p x 

inf {D{q xy \\p xy )} (33) 

q xy :H(q xly )>R, q x =p x 

The next step is to show that d33l is indeed E s ^(R,p s ) for uniform sources x and symmetric side information 
y, where x = y © s. 

inf {^xyllPxy)} =(o) D {^(fey|bxy)} (34) 

q xy :H{q xW )>R, q x =p x q xy :H(q x \ y )>R. 

=(&) m.axpR-E (p) 



= (c) max/j.R- (1 + p)log[VV(,s)i+ 
p>0 * — ' 

s 

= (d) Es,b{R,Ps) 

(b) follows from §5$ in Theorem [TJ (c) follows since ((6]) can be simplified for this case: 

E (p) = log 2 ^(^p xy (x,y)TT7)( 1+ rf 

y x 

= log 2 ^E(py(y)p x | y (x| y ))^)( 1 +") 

y x 

= log 2 ^p y (y)(^(p x | y (x|y))TT?)(i+P) 

y x 

= iog 2 E^(y)£fe( s )^) (1+p) 

y s 

= l0g 2 (5>s(*)^) (1+rf 

= (l + p)log 2 (J> s ( S )^) (35) 

s 

where this clearly matches from © to give us (d). 

Thus, for uniform source x and side information y = x Q s, the distribution q xy that minimizes the RHS of (l34l 
is also marginally uniform on x since all that needs to tilt is the distribution for s. Hence the constraint on the 
marginal q x = p x is redundant and (a) is true. □ 
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